PclR is a transcriptional activator of the gene that encodes the pneumococcal collagen-like protein PclA

The Gram-positive bacterium Streptococcus pneumoniae is a major human pathogen that shows high levels of genetic variability. The pneumococcal R6 genome harbours several gene clusters that are not present in all strains of the species. One of these clusters contains two divergent genes, pclA, which encodes a putative surface-exposed protein that contains large regions of collagen-like repeats, and spr1404 (here named pclR). PclA was shown to mediate pneumococcal adherence to host cells in vitro. In this work, we demonstrate that PclR (494 amino acids) is a transcriptional activator. It stimulates transcription of the pclA gene by binding to a specific DNA site upstream of the core promoter. In addition, we show that PclR has common features with the MgaSpn transcriptional regulator (493 amino acids), which is also encoded by the R6 genome. These proteins have high sequence similarity (60.3%), share the same organization of predicted functional domains, and generate multimeric complexes on linear double-stranded DNAs. However, on the PpclA promoter region, MgaSpn binds to a site different from the one recognized by PclR. Our results indicate that PclR and MgaSpn have similar DNA-binding properties but different DNA-binding specificities, pointing to a different regulatory role of both proteins.

The core genome of a given bacterial species contains genes shared by all strains. In addition, the bacterial genomes often harbour a variable number of genes that are present in one or more, but not all, strains of the species. These accessory genes contribute to the high degree of genetic variability found in many bacterial species. The function of the accessory genes can be very diverse, including a wide range of adaptive traits that might be beneficial for the bacteria under certain environmental situations 1 . The Gram-positive bacterium Streptococcus pneumoniae (the pneumococcus) is a major human pathogen that shows high levels of genetic diversity. It is normally found as a harmless commensal of the upper respiratory tract (mainly the nasopharynx). Nevertheless, in individuals with a weakened immune system, the pneumococcus can migrate to other tissues/organs and cause life-threatening diseases, such as pneumonia, bacteraemia, and meningitis 2,3 . Despite the development of different vaccines and antibiotic therapies, S. pneumoniae remains a leading cause of morbidity and mortality worldwide, being the most common cause of bacterial pneumonia in children under five years old (https:// www. who. int/ en/ news-room/ fact-sheets/ detail/ pneum onia). An interesting aspect of S. pneumoniae is its capacity to incorporate exogenous DNA into its genome, which is mainly achieved by horizontal gene transfer mechanisms 4,5 and plays an important role in its adaptation and evolution. Comparative genomic analyses have shown that over 20% of the coding sequences of any single pneumococcal isolate are not present in all strains [5][6][7] . Furthermore, it has been estimated that the rate at which the pneumococcus acquires genetic variation through recombination is much higher than the rate at which random mutations occur 8 .
The genome sequences of the pneumococcal strains TIGR4 (serotype 4) and R6 (a derivative of D39, serotype 2) were published in 2001 9,10 . A comparison of both sequences revealed that, among other differences, the R6 genome has six gene clusters that are absent from the TIGR4 genome 11 . One of the R6-specific clusters (9634 bp) consists of two divergent genes, spr1403 (new locus tag: SPR_RS06970) and spr1404 (new locus tag: SPR_RS06975). The spr1403 gene encodes a putative cell wall anchored protein that contains large regions of

Results
Organization of predicted functional domains in PclR. The 10 has several gene clusters that are absent from other pneumococcal genomes. One of them consists of the spr1403 gene (pclA, pneumococcal collagen-like protein A) 12 and the spr1404 gene (here named pclR) (Fig. 1). The ATG codon at coordinate 1,388,136 is likely the translation initiation codon of the pclR gene, as it is preceded by a putative Shine-Dalgarno sequence (5′-GGA GGA AA-3′). Translation from this ATG codon results in a protein of 494 residues (PclR). EMBOSS Needle Pairwise Sequence Alignment 25,26 of PclR and the pneumococcal MgaSpn transcriptional regulator (493 residues; locus_tag spr1622) revealed that these proteins have 60.3% of similarity and 40.1% of identity ( Supplementary Fig. S1). According to the Conserved Domain Database (CDD) 27 and the Protein Families Database (Pfam) 28 , PclR is predicted to have (i) two N-terminal helix-turn-helix DNA-binding domains, the so-called HTH_Mga (Family PF08280.14, residues 6 to 65) and Mga (Family PF05043. 16, residues 72 to 158) domains, and (ii) a central phosphoenolpyruvate phosphotransferase system (PTS) regulation domain (PRD) (Family PRD_Mga PF08270.14, residues 174 to 391) (Supplementary Fig. S2). Moreover, the protein structure prediction server Phyre2 29 revealed that the C-terminal region of PclR (residues 398 to 488) has structural homology to a PTS EIIB-like component. Thus, the organization of predicted functional domains in PclR is similar to the one reported for MgaSpn 15,24 . Supplementary Fig. S3 shows the predicted three-dimensional structure of the PclR monomer according to the AlphaFold Protein Structure Database (AlphaFold DB, https:// alpha fold. ebi. ac. uk) 30,31 , as well as the location of the predicted functional domains on such a structure. The AlphaFold Database predicts similar three-dimensional structures for the PclR and MgaSpn monomers ( Supplementary Fig. S4).
Expression of the pclR gene under laboratory conditions. By quantitative RT-PCR (qRT-PCR) assays and using the comparative C T method 32 , we determined the relative expression of the pclR gene in pneumococcal R6 cells grown under standard laboratory conditions: AGCH medium supplemented with 0.2% yeast extract and 0.3% sucrose, 37 °C, and without aeration. Compared to the stationary phase, transcription of pclR was found to be higher (~ 3.2-fold) at the logarithmic growth phase (Supplementary Table S1). We also determined the relative expression of the regulatory mgaSpn gene. Like pclR, transcription of mgaSpn was higher (~ 4.3-fold) in exponentially growing R6 cells (Supplementary Table S2 www.nature.com/scientificreports/ on the intracellular levels of pclA transcripts 12 . This finding suggested to us that, under laboratory conditions, an increase in the expression of the pclR gene could be necessary to detect an effect on the transcription of the pclA gene. Therefore, to test this hypothesis (see below), we constructed two R6 derivative strains designed to produce different levels of PclR. Specifically, we inserted the promoterless pclR gene into the pDLF constitutive expression vector 20 in both orientations, generating the recombinant plasmids pDLFpclR (expression of pclR) and pDLFpclR-i (no expression of pclR). Then, we introduced each recombinant plasmid into the R6 strain. By qRT-PCR, we determined the relative expression of the pclR gene in both strains: R6/pDLFpclR (expression of pclR from the chromosome and the plasmid) and R6/pDLFpclR-i (expression of pclR only from the chromosome). As expected, the amount of pclR transcripts was higher (~ 3.1-fold) in strain R6/pDLFpclR (Supplementary Table S3). Each recombinant plasmid was also introduced into the R6∆mga mutant strain, which lacks the mgaSpn gene 15 . As shown in Supplementary Table S4, the amount of pclR transcripts was higher (~ 4.9-fold) in strain R6∆mga/pDLFpclR compared to strain R6∆mga/pDLFpclR-i. In the next sections, we will refer to R6/ pDLFpclR and R6∆mga/pDLFpclR as strains with high levels of pclR expression, and to R6/pDLFpclR-i and R6∆mga/pDLFpclR-i as strains with low levels of pclR expression.
Identification of the promoter of the pclR gene. The BPROM program (Softberry, Inc.) predicts a promoter sequence (named PpclR herein) upstream of the pclR gene. It has a canonical − 10 element (TAT AAT ) and a possible − 35 element (TTTATA) at the suboptimal spacer length of 19 nucleotides (Fig. 1). By transcriptional fusions, we analysed the promoter activity of such a sequence ( Fig. 2A). A 185-bp DNA fragment (coordinates 1,387,937 to 1,388,121) was inserted into the pASTT promoter-probe vector, which is based on the gfp reporter gene. The recombinant plasmid (pASTT-PpclR) was then introduced into R6∆mga/pDLFpclR (high levels of pclR expression) and R6∆mga/pDLFpclR-i (low levels of pclR expression). In both strains, gfp expression was ~ 2.5fold higher than the basal level (strains harbouring pASTT, 0.08 ± 0.02 units). Similar results were obtained with the plasmid pASTT-PpclR∆105 ( Fig. 2A). These results showed that (i) the 80-bp region between coordinates 1,388,042 and 1,388,121 contains a promoter sequence, and (ii) different levels of pclR expression do not affect the activity of such a promoter (no autoregulation). Furthermore, no promoter activity was detected when the region between coordinates 1,388,099 and 1,388,121 was deleted (plasmid pASTT-PpclR∆-10) ( Fig. 2A). Such a deletion removes the − 10 element of the PpclR promoter (see Fig. 1). The transcription start site of the pclR gene was identified by primer extension assays. We used total RNA from R6 cells and the oligonucleotide Dw1404-2 (coordinates 1,388,208 to 1,388,232) ( Table 1). A cDNA product of 114 nucleotides was detected (Fig. 2C, lane 2), which could correspond to a transcription initiation event at coordinate 1,388,119. This coordinate is located 6 nucleotides downstream of the − 10 element of the PpclR promoter ( Fig. 1). Additionally, we performed primer extension assays with total RNA from R6 cells harbouring pASTT-PpclR. In this plasmid, the gfp reporter gene is under the control of the PpclR promoter ( Fig. 2A). As a primer, we used the oligonucleotide Int-gfp (Table 1), which anneals to gfp transcripts (Fig. 2B). A cDNA product of 105 nucleotides was detected (Fig. 2C, lane 1), which could correspond to a transcription initiation event at coordinate 1,388,120. This coordinate is located 7 nucleotides downstream of the − 10 element of the www.nature.com/scientificreports/ PpclR promoter (Fig. 1). In addition to the mentioned cDNA products, a possible non-specific product of 121 nucleotides was detected in both primer extension reactions (Fig. 2C, lanes 1 and 2). From these results, we conclude that the pneumococcal RNA polymerase recognizes the PpclR promoter and initiates transcription at coordinate 1,388,119/1,388,120 ( Fig. 1).
PclR activates the promoter of the pclA gene in bacterial cultures. By qRT-PCR assays and using total RNA from strains R6/pDLFpclR (high levels of pclR expression) and R6/pDLFpclR-i (low levels of pclR expression), we analysed the effect of PclR on the transcription of the pclA gene. Transcription of pclA was found to be higher (~ 3.4-fold) in the strain with high levels of pclR expression (Supplementary Table S5). Moreover, using total RNA from strains R6∆mga/pDLFpclR (high levels of pclR expression) and R6∆mga/pDLFpclR-i (low levels of pclR expression), we confirmed that the amount of pclA transcripts was higher (~ 4.5-fold) in the strain with high levels of pclR expression (Supplementary Table S6). These results indicated that PclR has a positive effect on the transcription of the pclA gene, both in the presence and in the absence of the MgaSpn regulator.
The ATG codon at coordinate 1,387,920 is likely the translation start site of the pclA gene ( Fig. 1). Sequence analysis of the region located between coordinates 1,388,224 and 1,387,910 revealed the existence of a putative promoter (named PpclA herein), in which the − 35 (TTGA TT) and − 10 (TACATT) elements are separated by 17 nucleotides (optimal length). To analyse whether such a sequence had promoter activity, we constructed several transcriptional fusions based on the gfp reporter gene (Fig. 3A). First, we inserted a 288-bp DNA fragment (coordinates 1,388,224 to 1,387,937) into the promoter-probe vector pASTT and introduced the recombinant plasmid (pASTT-PpclA) into the pneumococcal R6 strain. Measuring the fluorescence of the cultures, we did not detect significant differences in gfp expression between R6/pASTT (0.07 ± 0.01 units; background level) and R6/ www.nature.com/scientificreports/ pASTT-PpclA (0.08 ± 0.01 units). However, when pASTT-PpclA was introduced into R6∆mga/pDLFpclR (high levels of pclR expression) and R6∆mga/pDLFpclR-i (low levels of pclR expression), we detected a higher level of gfp expression (~ 2.5-fold) in the strain with high levels of pclR expression (Fig. 3A). Similar results were obtained with plasmids pASTT-PpclA∆103 and pASTT-PpclA∆173, which allowed us to conclude that the 115-bp region between coordinates 1,388,051 and 1,387,937 contains a PclR-dependent promoter. No PclR-dependent promoter activity was detected (i) when the − 10 element of the PpclA promoter was deleted (from coordinate 1,387,974 to 1,387,937; plasmid pASTT-PpclA∆-10), and (ii) when a 30-bp region located upstream of the PpclA promoter was removed (from coordinate 1,388,051 to 1,388,021; plasmids pASTT-PpclA∆203 and pASTT-PpclA∆224) (Figs. 1, 3A). Finally, by primer extension assays (Fig. 3C), we confirmed that the PpclA promoter located on pASTT-PpclA∆103 ( Fig. 3B) is functional. We used total RNA from strain R6∆mga/pDLFpclR/pASTT-PpclA∆103 (high levels of pclR expression) and the oligonucleotide Int-gfp, which anneals to gfp transcripts ( To determine the region protected by PclR-His on the non-coding strand, a 281-bp DNA fragment (coordinates 1,388,232 to 1,387,952) was radioactively labelled at the 5′-end of the non-coding strand (Fig. 4B). At 400 nM of PclR-His, major changes in DNase I sensitivity   (Fig. 4C). This region contains the sequence (from position − 105 to − 75) that PclR needs to activate the PpclA promoter (Fig. 3A). Thus, we conclude that PclR activates transcription of the pclA gene by binding to a specific site upstream of the PpclA core promoter. Using the bend.it server (pongor.itk.ppke.hu/dna/bend_it.html), we calculated the bendability/ curvature propensity plot of the 270-bp DNA fragment. The profile contains two potential intrinsic curvatures (~ 10-11 degrees per helical turn) within the PclR binding site ( Supplementary Fig. S5). Intrinsic curvatures flanked by regions of bendability have been also predicted in DNA sites recognized by the MgaSpn transcriptional activator 17 .
On both DNA strands and at 800 nM of PclR-His (Fig. 4A,B), regions protected against DNase I digestion were observed along the DNA fragment, which suggested that, upon binding to the primary site, additional PclR-His units interacted with the adjacent DNA regions. This result is consistent with the ability of PclR-His to generate multimeric complexes on linear double-stranded DNAs ( Supplementary Fig. S6A), a feature previously reported for the MgaSpn transcriptional regulator 17,24 . Specifically, we performed electrophoretic mobility shift assays (EMSAs) with the 270-pb DNA fragment that had been used in the DNase I footprinting assay. As shown in Supplementary Fig. S6A, the 32 P-labelled DNA was incubated with different concentrations of PclR-His in the presence of non-labelled competitor calf thymus DNA. Free and bound DNAs were separated by electrophoresis www.nature.com/scientificreports/ on a native polyacrylamide (6%) gel. At 200 nM of PclR-His, free DNA and four protein-DNA complexes were detected. In addition, as the protein concentration was increased, such complexes disappeared and higher-order complexes appeared. This pattern of complexes suggested that multiple protein units bind orderly on the same linear DNA molecule.

PclR and MgaSpn have different DNA-binding specificities. According to EMBOSS Needle Pair-
wise Sequence Alignment 25,26 , the N-terminal regions (first 170 amino acids) of PclR and MgaSpn share high sequence similarity (66.5% of similarity and 50% of identity). Both regions contain two predicted helix-turnhelix DNA-binding domains, the so-called HTH_Mga (residues 6 to 65) and Mga (residues 72 to 158) domains ( Supplementary Figs. S1, S2, and S3). To know whether MgaSpn recognized the PpclA promoter region, we performed DNase I footprinting assays using MgaSpn-His and the 270-bp DNA fragment. The 270-bp DNA fragment was radioactively labelled at the 5′-end of the coding strand (Fig. 5A). At 75 nM of MgaSpn-His, diminished DNase I cleavages were observed from position − 173 to − 196, and from − 102 to − 115. Moreover, positions − 47, − 69, − 87, and − 131 were slightly more sensitive to DNase I digestion (Fig. 5A,C). This result was confirmed in shorter electrophoretic runs (Supplementary Fig. S7). At higher MgaSpn-His concentrations, protections against DNase I digestion were observed along the entire DNA fragment (Fig. 5A), which is consistent with the pattern of protein-DNA complexes observed by EMSA ( Supplementary Fig. S6B), and with the ability of MgaSpn to form multimeric complexes on linear DNA 17 . The region protected by MgaSpn-His on the non-coding strand was defined using the 281-bp DNA fragment (Fig. 5B).  (Fig. 5B,C). These results showed that PclR-His and MgaSpn-His recognize different sites on the PpclA promoter region (Fig. 6).  (Fig. 6) remains unknown. MgaSpn activates the transcription of the spr1623-spr1626 operon by binding to a specific site upstream of the P1623B promoter (positions − 60 to − 99) 15,17 . By DNase I footprinting assays, we also analysed whether PclR-His recognized the P1623B promoter region. We used a 222-bp DNA fragment (coordinates 1,598,298 to 1,598,519) that contains the P1623B promoter and the site recognized by MgaSpn 17 . Specific regions protected against DNase I digestion were not detected ( Supplementary Fig. S8), indicating that PclR-His does not recognize a specific site on the P1623B promoter region. This result correlated with the inability of PclR to influence the activity of the P1623B promoter. By qRT-PCR assays, we found similar levels of spr1623 transcripts in strains that produce different levels of PclR: R6/pDLFpclR (high levels of pclR expression) versus R6/pDLFpclR-i (low levels of pclR expression) (Supplementary Table S8), and R6∆mga/pDLFpclR (high levels of pclR expression) versus R6∆mga/pDLFpclR-i (low levels of pclR expression) (Supplementary Table S9).
Taken together, we conclude that PclR and MgaSpn have different DNA-binding specificities. They recognize different sites on the PpclA promoter region. Moreover, unlike MgaSpn, PclR does not bind to the P1623B promoter region. In agreement with these results, MgaSpn does not affect the activity of the PpclA promoter, and PclR does not affect the activity of the P1623B promoter.

Discussion
S. pneumoniae is an opportunistic pathogen able to proliferate in different niches of the human host. Its adaptation to new environments and host-imposed stresses partially relies on the activity of specific transcriptional regulators. The genome of the pneumococcal R6 strain has several gene clusters that are absent from other strains. One of these clusters contains two divergent genes, pclA, which encodes a putative cell surface protein 12 , and pclR, whose function has been investigated in this work. We have identified the promoter of each gene (PpclA and PpclR) and demonstrated that PclR functions as a transcriptional activator. It stimulates pclA transcription by binding to a specific site upstream of the PpclA core promoter. PclA is a collagen-like protein, which contains the peptidoglycan anchor LPXTG motif and several GXY amino acid repeats 12 . This repeating pattern is the most typical feature in the molecular architecture of bacterial collagen-like proteins 37 . In pathogenic streptococci, surface-exposed collagen-like proteins have been associated with processes of colonization, biofilm formation, and evasion of the host immune response 38 . In the case of PclA, Paterson et al. 12 reported that a pclA deletion mutant strain is defective in adherence and invasion of nasopharyngeal and epithelial cells in vitro. Thus, we speculate that PclR could have a regulatory role during pneumococcal colonization. Using the EMBOSS Needle Pairwise Sequence Alignment program 25,26 , we have found that PclR has sequence similarity (40.4%) to the Mga global regulator (530 residues; GenBank AAT87855.1) of the Gram-positive bacterium S. pyogenes (Group A Streptococcus; GAS). It has been reported that Mga regulates positively the transcription of the scl1 gene (also known as sclA) [39][40][41] . This gene encodes a collagen-like surface protein (Scl1) that interacts with integrins, cellular fibronectin, and laminin [42][43][44] . Moreover, it has been shown that Scl1 mediates GAS adherence to and internalization by human pharyngeal epithelial cells, playing an important role in pathogenesis 43 .
DNA rearrangements and gene acquisition are natural strategies for the generation of genetic diversity in S. pneumoniae, a feature that has been recently shown to be increased by the presence of temperate bacteriophages integrated into different regions of the pneumococcal chromosome 45  www.nature.com/scientificreports/ www.nature.com/scientificreports/ between pairs of pneumococcal isolates can diverge by as much as 30% 46 . The sequences of the pneumococcal TIGR4 and R6 genomes were published in 2001 9,10 . A comparison of the two sequences revealed the existence of strain-specific genes, many of which are organized in clusters. Specifically, the TIGR4 genome has twelve gene clusters (~ 7% of the total genome) that are not present in R6, and the R6 genome has six gene clusters (~ 3% of the total genome) that are absent from TIGR4 11 . PCR analyses of the distribution of the R6-specific pclA-pclR gene cluster in a collection of clinical isolates revealed that many of such isolates lacked both genes (~ 60% of the strains examined) 12 . Subsequently, pclA was found to be associated with Pneumococcal Molecular Epidemiology Network (PMEN) clones 13 . Clones included in the PMEN are resistant to one or more antibiotics that are in wide clinical use. Moreover, they have a wide geographic distribution (https:// www. pneum ogen. net/ pmen). Now, we have analysed whether the pclR gene was present in the 24 pneumococcal genomes shown in Supplementary Table S10. Such genomes are fully sequenced and assembled (NCBI database). Moreover, they encode a highly conserved MgaSpn regulator 15,21 . Using the BLASTP protein sequence alignment program 47 , we have found that only nine out of the 24 genomes encode PclR: strains ATCC 700669, A026, D39, JJA, INV104, ST556, Taiwan19F14, TCH8431/19A and 70585. The PclR regulator of these strains is identical or almost identical to the PclR regulator encoded by the R6 genome (Supplementary Table S10). Like R6, the nine genomes also encode PclA.
A study based on RNA-seq revealed profound changes in the relative amount of the RNAs synthesized by the pneumococcal D39V strain under a wide range of infection-relevant conditions. The expression data as well as the co-expression matrix were published in the PneumoExpress database (https:// veeni nglab. com/ pneum oexpr ess) 48 . The D39V genome contains the pclA-pclR gene cluster (genes SPV_1376 and SPV_1377 in D39V). Searching in PneumoExpress, we have found that the highest expression level of pclA and pclR corresponds to bacteria grown in nose mimicking conditions, which simulate colonization. Both genes were also highly expressed in bacteria grown in lung mimicking conditions, which simulate pneumonia, and in cerebrospinal fluid-mimicking conditions from 37 to 40 °C, which simulate meningeal fever. In the case of the mgaSpn regulatory gene (SPV_1587) and its target operon spr1623-spr1626 (SPV_1588-SPV_1591), the highest expression level corresponds also to bacteria grown in nose mimicking conditions. Hence, the expression data suggest that PclR and MgaSpn could play a significant role during nasopharyngeal colonization. Previous studies performed by Hemsley et al. 16 showed that a mgaSpn deletion mutant strain was attenuated for both nasopharyngeal carriage and pneumonia in murine infection models. Concerning the expression of the mgaSpn and pclR regulatory genes under standard laboratory conditions (this work), transcription of both genes was found to be higher in the logarithmic phase compared to the stationary phase. Most of the transcription processes in exponentially growing pneumococcal bacteria are initiated by the RNA polymerase that contains the housekeeping sigma factor SigA, also known as RpoD and σ43. In the promoters recognized by the housekeeping factor, the consensus sequence of the − 10 element is 5′-TAT AAT -3′, which is present in the promoter of mgaSpn (Pmga) and the promoter of pclR (PpclR). It has been shown that SigA recognizes the Pmga promoter in vitro 49 .
The pneumococcal MgaSpn transcriptional regulator is a member of the Mga/AtxA family [17][18][19] , which also includes the global regulator MafR of E. faecalis 20 . Here we have shown that PclR shares some features with MgaSpn. These proteins have the same size (494-493 residues), exhibit a high degree of sequence similarity (60%), and have the same organization of predicted functional domains, including two N-terminal helix-turnhelix DNA-binding motifs. Furthermore, PclR can generate multimeric complexes on linear double-stranded DNA fragments, a feature reported first for MgaSpn 17 and later on for MafR 50 . Regarding their mechanism to activate transcription from specific promoters, both proteins stimulate transcription by binding to a specific site upstream of the core promoter. PclR recognizes a site upstream of the PpclA promoter (positions − 68 to − 169), and MgaSpn activates transcription of a four-gene operon (spr1623-spr1626) by binding to a site upstream of the P1623B promoter (positions − 60 to − 99) 17 . Nevertheless, despite these similarities, we have shown that PclR and MgaSpn have different DNA-binding specificities. PclR does not bind to the site recognized by MgaSpn on the P1623B promoter region, and MgaSpn does not bind to the site recognized by PclR on the PpclA promoter region. As a consequence, PclR does not influence the expression of the spr1623 gene, and MgaSpn does not influence the expression of the pclA gene.
In summary, the pclA-pclR gene cluster of the pneumococcal R6 strain is not present in all strains of the species. Our present work demonstrates that PclR is a transcriptional activator of the pclA gene (collagen-like protein). PclR recognizes a specific DNA site upstream of the PpclA core promoter. Moreover, PclR is homologous to the MgaSpn transcriptional regulator, which is also encoded by the R6 genome. Our study shows that PclR and MgaSpn have similar DNA-binding properties but different DNA-binding specificities.

Materials and methods
Oligonucleotides, bacterial strains, and plasmids. The oligonucleotides used in this work are listed in Table 1. S. pneumoniae strains R6 10 and R6∆mga 15 were used. R6∆mga lacks the mgaSpn regulatory gene. The pneumococcal strains R6∆mga/pDL287 (absence of MgaSpn) and R6∆mga/pDLPsulA::mga (plasmid-encoded MgaSpn) were described previously 15 . Plasmid pDLF is a constitutive expression vector that carries a kanamycin resistance gene 20 . This vector has an engineered unique restriction site for SphI downstream of the enterococcal P2493 promoter 34 . Plasmids pDLFpclR and pDLFpclR-i are pDLF derivatives. For their construction, a 1594-bp region of the R6 chromosome was amplified by PCR using the FpclR and RpclR oligonucleotides. After SphI digestion, the 1561-bp restriction fragment was inserted into the SphI site of pDLF in both orientations, being pDLFpclR the recombinant plasmid that carries the gene pclR under the control of the P2493 promoter. Plasmid pASTT is a promoter-probe vector based on the gfp reporter gene 51 . It is a pAST derivative 34 and carries a tetracycline resistance gene. The following pASTT-derivatives were constructed in this work. In all cases, a region of the R6 chromosome was amplified by PCR using the indicated primers. Then, the PCR product was digested www.nature.com/scientificreports/ Growth and transformation of bacteria. Pneumococcal cells were grown in AGCH medium 34,53 supplemented with 0.3% sucrose and 0.2% yeast extract, at 37 °C in a static water bath. For plasmid-harbouring cells, the medium was supplemented with kanamycin (50 µg/ml; pDLF derivatives) and/or tetracycline (1 µg/ml; pASTT derivatives). The protocol used for natural transformation of S. pneumoniae was described previously 33 . E. coli cells carrying a pET24b derivative were grown in tryptone-yeast extract (TY) medium supplemented with kanamycin (30 µg/ml), at 37 °C in a shaking water bath. The protocol used to transform E. coli by electroporation was described previously 54 .
Overproduction and purification of His-tagged proteins. E. coli strains BL21(DE3)/pET24b-mgaSpn-His 15 and BL21(DE3)/pET24b-pclR-His (this work, see above) were used. The protocols used to overproduce and purify the MgaSpn-His protein were described previously 15 . MgaSpn-His purification involved the use of a HisTrap HP column (GE Healthcare) and a HiLoad Superdex 200 gel filtration column (Amersham). For overproduction and purification of the PclR-His protein, the protocols reported for MafR-His 50 were used. Basically, PclR-His purification included the following steps: (i) precipitation of nucleic acids with polyethyleneimine (PEI) (0.2%) in the presence of NaCl (300 mM). The ionic strength at which PEI precipitation was done was low enough to recover PclR-His in the PEI pellet, (ii) elution of PclR-His from the PEI pellet using a higher ionic strength buffer (700 mM NaCl), (iii) precipitation of the eluted proteins with 70% saturated ammonium sulphate, and (iv) fast-pressure liquid chromatography (Biologic Duoflow, Bio-Rad) on a nickel affinity column (HisTrap HP) ( Supplementary Fig. S9). Protein concentration was determined using a NanoDrop ND-2000 Spectrophotometer (Thermo Scientific).
DNA and RNA isolation. Genomic DNA from S. pneumoniae was prepared as reported 53 . Plasmid DNA was prepared using the High Pure Plasmid Isolation Kit (Roche Applied Science) as described 34 . Total RNA was isolated using the RNeasy Mini Kit (QIAGEN). Cultures were processed as specified by the supplier, except that cells were resuspended in a buffer that contained 50 mM Tris-HCl, pH 7.6, 1 mM EDTA, 50 mM NaCl, and 0.1% deoxycholate. Then, cells were incubated at 37 °C for 5 min (cell lysis). The integrity of rRNAs was analysed by

Polymerase chain reaction (PCR) and quantitative RT-PCR (qRT-PCR). Phusion High-Fidelity
DNA Polymerase (Thermo Scientific) was used for all PCR applications as reported 34 . PCR products were purified with the QIAquick PCR Purification Kit (QIAGEN). In the qRT-PCR assays, for each strain, total RNA was isolated from three independent bacterial cultures. Then, from each RNA preparation, cDNA was synthesized. For cDNA synthesis with random primers, the iScript Select cDNA Synthesis Kit (Bio-Rad) was used as described previously 20 . To rule out the presence of genomic DNA in the RNA preparations, reactions without adding reverse transcriptase were performed. Quantitative PCRs were carried out using the iQ SYBR Green Supermix (Bio-Rad) and an iCycler Thermal Cycler (Bio-Rad) as reported 20 . From each cDNA sample, three PCRs per gene (gene of interest and internal control gene) were performed. Data were analysed with the iQ TM 5 Optical System Software. Relative quantification of gene expression was performed using the comparative C T method 32 . The era gene (spr0871) was used as the internal control gene (oligonucleotides Fera-q and Rera-q).
The oligonucleotides used to determine the relative expression of the pclR (FpclR-q and RpclR-q), pclA (FpclA-q and RpclA-q), mgaSpn (1622A and 1622 J) and spr1623 (F1623-q and 1623B) are shown in Table 1. The threshold cycle values (C T ) of the gene of interest and the internal control gene were used to calculate 2 −ΔCT , where ΔC T = C T gene of interest-C T internal control. In general, for each cDNA sample (total three), the mean C T from the three PCRs for the gene of interest, the mean C T from the three PCRs for the internal control gene, and the 2 −ΔCT value were calculated. Then, the mean ± standard deviation of the three 2 −ΔCT values was calculated. The differences between two groups were analysed using a Student´s t-test (paired, two-tailed). For the gene of interest, the fold change in expression (FC) in one strain compared to another was obtained by dividing the corresponding mean 2 −ΔCT values. The results of these analyses are shown in Supplementary Tables S1 to S9 and Supplementary  Fig. S10.
Primer extension. The ThermoScript Reverse Transcriptase enzyme (Invitrogen) and [α-32 P]-dATP (3000 Ci/mmol; PerkinElmer) were used as reported 15 . Basically, primer extension reactions (20 µl) contained 1 pmol of the indicated oligonucleotide and 2.5-15 µg of total RNA from the indicated strain. To anneal the primer with the transcript, samples were incubated at 65 °C for 5 min. Extension reactions were carried out at 55 °C for 45 min. After heating at 85 °C for 5 min, non-incorporated [α-32 P]-dATP was removed using Illustra MicroSpin™ G-25 columns (GE Healthcare). Samples were ethanol precipitated as described 51 . cDNA products were analysed by sequencing gel (8 M urea, 6% polyacrylamide) electrophoresis. As DNA size markers, dideoxysequencing reactions were carried out using M13mp18 DNA, primer − 40 M13 36 , and the Sequenase Version 2.0 DNA Sequencing kit (USB Corporation). Labelled products were visualized using a Fujifilm Image Analyser FLA-3000.
Fluorescence assays. Pneumococcal cells harbouring a pASTT derivative were grown as indicated above to an optical density at 650 nm (OD 650 ) of 0.3-0.4 (exponential phase). Then, cultures were processed as reported 51 . Fluorescence intensity was measured using a Thermo Scientific Varioskan Flash instrument.
DNase I footprinting assays. Oligonucleotides were 32 P-labelled at the 5′-end as described 17 . PCR amplification using a 32 P-labelled oligonucleotide was used to obtain double-stranded DNA fragments labelled at the 5′-end of one of the strands. Three regions of the R6 chromosome were amplified by PCR: (a) a 270-bp region (coordinates 1,388,196 to 1,387,927) using the Up1404 and Dw1404 oligonucleotides, (b) a 281-bp region (coordinates 1,388,232 to 1,387,952) using the Up1404-2 and Dw1404-2 oligonucleotides, and (c) a 222-bp region (coordinates 1,598,298 to 1,598,519) using the 1622H and 1622I oligonucleotides. Binding reactions and DNase I digestion were performed as described 51 . Samples were analysed by sequencing gel (6% polyacrylamide, 8 M urea) electrophoresis. Labelled products were visualized using a Fujifilm Image Analyser FLA-3000 and the intensity of the bands was quantified using the Quantity One software (Bio-Rad).
Electrophoretic mobility shift assays. Binding reactions were performed as described 50 . When indicated, non-labelled competitor calf thymus DNA and 32 P-labelled DNA were added simultaneously to the binding reaction. Reaction mixtures were analysed by electrophoresis on native polyacrylamide (6%) gels.

Data availability
All data generated and analysed during this study are included in this Manuscript and the Supplementary Information file. The sequences of genes and proteins analysed in the current study are available in the NCBI database: Locus_tag = SPR_RS06975 (old_locus_tag = spr1404).